Statistical evaluation of SAGE libraries: consequences for experimental design.

نویسندگان

  • Jan M Ruijter
  • Antoine H C Van Kampen
  • Frank Baas
چکیده

Since the introduction of serial analysis of gene expression (SAGE) as a method to quantitatively analyze the differential expression of genes, several statistical tests have been published for the pairwise comparison of SAGE libraries. Testing the difference between the number of specific tags found in two SAGE libraries is hampered by the fact that each SAGE library is only one measurement: the necessary information on biological variation or experimental precision is not available. In the currently available tests, a measure of this variance is obtained from simulation or based on the properties of the tag distribution. To help the user of SAGE to decide between these tests, five different pairwise tests have been compared by determining the critical values, that is, the lowest number of tags that, given an observed number of tags in one library, needs to be found in the other library to result in a significant P value. The five tests included in this comparison are SAGE300, the tests described by Madden et al. (Oncogene 15: 1079-1085, 1997) and by Audic and Claverie (Genome Res 7: 986-995, 1997), Fisher's Exact test, and the Z test, which is equivalent to the chi-squared test. The comparison showed that, for SAGE libraries of equal as well as different size, SAGE300, Fisher's Exact test, Z test, and the Audic and Claverie test have critical values within 1.5% of each other. This indicates that these four tests will give essentially the same results when applied to SAGE libraries. The Madden test, which can only be used for libraries of similar size, is, with 25% higher critical values, more conservative, probably because the variance measure in its test statistic is not appropriate for hypothesis testing. The consequences for the choice of SAGE library sizes are discussed.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Identification and prevention of a GC content bias in SAGE libraries.

Serial Analysis of Gene Expression (SAGE) is becoming a widely used gene expression profiling method for the study of development, cancer and other human diseases. Investigators using SAGE rely heavily on the quantitative aspect of this method for cataloging gene expression and comparing multiple SAGE libraries. We have developed additional computational and statistical tools to assess the qual...

متن کامل

Presenting a Framework for Supporting Life-long Learning in Iranian public libraries and Its validation

Purpose: Since nowadays public libraries are considered lifelong learning centers, these centers must have the required standards and conditions to support lifelong learning in order that they could help society members to achieve their personal and professional learning more effectively. Accordingly, it is necessary to develop and provide a mechanism to support lifelong learning in public libr...

متن کامل

Statistical modeling of sequencing errors in SAGE libraries.

MOTIVATION Sequencing errors may bias the gene expression measurements made by Serial Analysis of Gene Expression (SAGE). They may introduce non-existent tags at low abundance and decrease the real abundance of other tags. These effects are increased in the longer tags generated in LongSAGE libraries. Current sequencing technology generates quite accurate estimates of sequencing error rates. He...

متن کامل

Full transcriptome analysis of rhabdomyosarcoma, normal, and fetal skeletal muscle: statistical comparison of multiple SAGE libraries.

Rhabdomyosarcoma (RMS) is the most frequent soft tissue sarcoma in children. Improved treatment strategies have increased overall survival, but the response of approximately one-third of the patients is still poor. To increase the knowledge of RMS pathogenesis, we performed the first full transcriptome analysis of RMS using serial analysis of gene expression (SAGE). With a G-test for the simult...

متن کامل

Mesothelin is overexpressed in the vast majority of ductal adenocarcinomas of the pancreas: identification of a new pancreatic cancer marker by serial analysis of gene expression (SAGE).

PURPOSE Effective new markers of pancreatic carcinoma are urgently needed. In a previous analysis of gene expression in pancreatic adenocarcinoma using serial analysis of gene expression (SAGE), we found that the tag for the mesothelin mRNA transcript was present in seven of eight SAGE libraries derived from pancreatic carcinomas but not in the two SAGE libraries derived from normal pancreatic ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Physiological genomics

دوره 11 2  شماره 

صفحات  -

تاریخ انتشار 2002